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We consider the problem of source coding subject to a fidelity criterion 
for a simple network connecting a single source with two receivers via a 
common channel and two private channels. The region of attainable rates 
is formulated as an information-theoretic minimization. Several upper 
and lower bounds are developed and shown to actually yield a portion of 
the desired region in certain cases. 

I. INTRODUCTION 

1.1 Informal statement of the problem 

To fix ideas, let us consider the following problem. Suppose that we 
are given a data source whose output is a sequence U h Uz, • • • , that 
appears at the source output at the rate of 1 per second. The {U k }i 
is a sequence of independent copies of the discrete random variable 
U, with probability distribution Pr \U = u) = Q(u), u E 11 a finite 
set. Our task is to transmit this data sequence over a communication 
channel having a capacity of C bits per second so that it is represented 
at the output as U h U 2 , • • • , EH. We assume that the data are trans- 
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mitted over the channel in blocks of length n, and allow processing at 
both the channel input and output (encoding and decoding). We define 
the "error rate" as 

A = E- £ d H (U kl U k ), (la) 

nk=i 

where 

0, u = d, 



d H (u, ti) = 



i, »*A < lb) 



is the Hamming metric. Thus, A is the average fraction of data digits 
delivered in error. 

The question we pose is : What is the smallest capacity C such that 
(for n sufficiently large) we can transmit the data through the channel 
and achieve an arbitrarily small A? The well-known answer to the 
question is that the minimum capacity C is the entropy H(U), de- 
fined by* 

H(U) = - E Q(u) log Q(u). (2) 

Now consider the case where the random variable U is a pair (X, Y) 
where x G 9C and y £ % We have 

Q(u) = Q(x, y) = Pr {X = x, Y = y\, 
and 

H{U) = H{x, Y) = - E Q(x, y) log Q(x, y). 

x,y 

Setting U = (X, Y), A [as defined in (1)] is the fraction of pairs 
delivered in error. Thus, we conclude that H(X, Y) is the minimum 
channel capacity required to transmit the source output { (Xk, Yk) } 
with the error rate A arbitrarily small. 

Next, let us assume that, as above, U = (X, Y), but that it is only 
required to transmit the sequence [X k ) through a channel having a 
capacity Cx, and to deliver it at the channel output as \%k)- Let 

Ax = E± ± d H (X k ,£ k ) 
n aj=i 

be the error rate for a system with block coding of block length n. 
The special assumption here is that the random sequence {F fc }fc=i is 
available to the encoder and the decoder. See Fig. 1. 



* All logarithms in this paper are assumed to be taken to the base 2. 
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Fig. 1 — Source coding with side information. 



Again we ask : What is the minimum capacity C\ required to trans- 
mit [X k \ with A x arbitrarily small (with n sufficiently large)? The 
answer 1 - 2 is that the minimum Ci is the "conditional entropy," 
H(X\Y), denned by 

Q(x, y) 



H(X\Y) = - Z Q(x,y) log 

x,y 



Qy(y) 
= ~ lLQr(y)LZQx\Y(x\y) log Q X] r(x\yn 



(3a) 



where 



and 



Qr(y) = Pr(F = y] = £ Q(x,y) 

sex 



Qx\ Y {x\y) = 0^ = Pr {X = x\ Y = y\. 



QAy) 



(3b) 



(3c) 



Note that H (X\ Y) + H(Y) = H(X, Y). 

Let us remark that the above still holds if, instead of deliver- 
ing \Y k \ to the decoder, we delivered a sequence {Y k \, where 
Ay = E(l/n)Y,k=id H {Y k , Y k ) can be made arbitrarily small. Thus, 
the capacity of the "side channel" must be at least H(Y). 

Finally, we turn our attention to the problem to which this paper is 
devoted. Let the source output be { (X k , Y k ) } " =1 , as above. We assume 
here, however, that there are two receivers. Receiver 1 is interested in 
obtaining a reproduction [X k \ of the sequence [X k \, and receiver 2 is 
interested in obtaining a reproduction { Y k ) of the sequence { Y k ) . 
Assume further that a network consisting of three channels is avail- 
able, as in Fig. 2. The first of these channels is a "common" channel 
(with capacity C ) that connects the transmitter to both receivers, and 
the other two are "private" channels that connect the transmitter to 
each of the two receivers (with capacities C\ and d). Assuming that 
we use block coding with block length n, the error rates are 



A x = E- £ d H {X k ,X k ) 



(4a) 
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Fig. 2 — Source coding for a network. 



Ay = E- E d„(Y h , Y k ). 



(4b) 



We say that a "rate-triple" (R , Ri, R2) is achievable if, for any triple 
of channel capacities (C , C\, C2) for which C, > Ri (i = 0, 1, 2) and 
any e > 0, transmission over the network of Fig. 2 (with these 
capacities) is possible (with n sufficiently large) with Ax, Ay ^ c. 
Our problem is the determination of the set 01 of achievable rate-triples. 
Before stating our results, we digress to give a formal and precise 
statement of the problem as well as some other specialized information. 
This digression can be omitted by the casual reader. 

1.2 Digression — formal statement of the problem 

Let { (Xk, Yk)}k=i be a sequence of independent drawings of a pair 
of random variables (X, Y), X G 9C, Y G 'y. EC and 'y are finite sets 
and Pr {X = x, Y = y] = Q(x, y), x G 9C, y G % The marginal 
distributions are 

Qx(x) = E Q(*,y) and Q Y (y) = E Q(x, y). 

Often, when the random variables are clear from the context, we write 
Qx(x) as Q(x), etc. Define, for m = 1, 2, • • •, the set 

I m = {0,1,2, ..., m -l}. (5) 

An encoder with parameters {n, M , Mi, M 2 ) is a mapping 

Jb : X n X c y n -> I Mo X /^ X /m 2 . (6) 
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Given an encoder, a decoder is a pair of mappings 

fF : Im. X I Ml -> £" (7a) 

/i y) :/if.X/if t -^«y-. (7b) 

An encoder-decoder with parameters (n, M , M h M 2 ) is applied as 
follows. Let 

/,(X, Y) = (So, Si, S,), (8a) 

where 

X=(Ii,--,I.) and Y= (Fi, •■•,F B ). 
Then let 

X = fFiS^Si), (8b) 

Y = fP(S ,Si). (8c) 

The resulting error rate is 

A = max (Ay, Ay), (9a) 

where 

Ax = E- t d H (X k ,X k ), (9b) 

n k=i 

A Y = E- E d ff (F fc , F ft ), (9c) 

dn(-, •) is denned by (lb), and Jt k , Y k are the fcth coordinate of X 
and Y, respectively. The Hamming distance D H (u, v) between the 
n-vectors u and v is the number of positions in which u and v differ. 
Thus, Ax = E(l/n)D H (X, Y) and Ay = E(l/n)D H (Y, Y). 

The correspondence between the encoder-decoder pair (or "code") 
as denned here and the communication system of Fig. 2 should be 
clear. Note that the capacities of the channels in that diagram must 
be at least d = (1/n) log 2 M t (i = 0, 1, 2). 

A triple (R , Ri, R 2 ) is said to be achievable if, for arbitrary 
e > 0, there exists (for n sufficiently large) a code with parameters 
(n, M , Mi ,M 2 ) with M { ^ 2 n(B,+<) , i = 0, 1, 2, and error rate A ^ e. 
We define (ft as the set of achievable rates. Our main problem is to 
ascertain the region (ft. 

It follows from the definition that (ft is a closed subset of Euclidean 
three-space and the 61 has the property that 

(R , R h R 2 ) G (ft -> (Ro + to, Ri + «i, R 2 + e 2 ) G (ft, (10) 
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€» ^ 0, t = 0, 1, 2. The region (R is therefore completely denned by 
giving its lower boundary (R, where 

<R = [R ,Ri,R») e <R: (-£„, -#i, &) G <R, nn 

& ^ B<(« = 0, 1, 2) -> J?< = B*(» = 0, 1, 2)}. U ; 

It follows immediately that (R too is closed. 

It can also be verified by a simple "time-sharing" argument that (R 
is convex (see appendix). This leads us to the following equivalent 
formulation of the problem. Let a< ^ 0, i = 0, 1, 2 be arbitrary. Then 
define 

Ti(ao, ai, a 2 ) = min (ao^o + aiRi + a%R^ . 
(«o,ftj,«2>e<R 

Then it follows from the convexity of (R that the lower boundary (R 
is the upper envelope of the family of planes Y,octiRi = Ti(a , ai, a 2 ). 

We can think of Ti(a , a\, a 2 ) as the minimum cost of transmitting, 
using a code with rate-triple (Ro, Ri, R2) over the network of Fig. 2, 
when the cost of transmitting a bit per second over the common channel 
is a-o and the costs of transmitting a bit per second over the private 
channels to receivers 1 and 2 are «i and a 2 , respectively. Now, since 
information sent over the common channel (in Fig. 2) can alternatively 
be sent over both private channels, it is never necessary to consider 
the case where the sum of the costs of a bit per second on the private 
channels ai + a 2 < ao, the cost of a bit per second on the common 
channel. Similarly, we need never consider the cases where a.\ > a , or 
a 2 > ao, since information transmitted over a private channel can 
alternatively be sent over the common channel. Since we can nor- 
malize a as unity, the following theorem should be plausible. A com- 
plete proof is given in the appendix. 

For R = (Ro, Ri, R2) satisfying Ri ^ 0, and a = (ai, 0:2) arbitrary, 
let the "cost" be defined by 

C(o, R) = #0 + aiRi + a 2 i2 2 . (12) 

With a held fixed, let 

r(a) = min C(a, R). (13) 

The indicated minimum exists because (R is closed. For a arbitrary, 
let 8(a) be the set of R £ (R that achieve !T(a) = C(a, R). 
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Theorem 1 : 

(i)ffi Q U S(a), 
aea 

(it) U s(o) c m, 
aea' 

w/iere <Ae boundary (R is defined in (11), Q, is the set of a = (ax, a 2 ) tfiaJ 
satisfy 

^ ax, a 2 ^ 1, «x + a 2 ^ 1, 

and ft' is a with the elements (0, 1) and (1,0) deleted. 

Remarks : 

(1) (0, 1) and (1, 0) are the only pairs in a with zero elements. Thus, 
ft and a' are nearly identical. _ 

(2) The theorem implies that (R is upper envelope in (R , R\, R 2 )- 
space of the family of planes defined by 

Ro + atRi + a 2 R 2 = T(a), 

«G a. 

7.3 Upper and lower bounds on (R 
7.3.1 Lower bounds 

We can immediately give some lower bounds to the region (R. We 
state them as 

Theorem 2 : 7/ (R , R h R 2 ) G <R, toen 

(a) flo + Ri + #2 ^ H (X, Y), 

(b) R + Ri^H(X), 

(c) Ro + R 2 ^ H(Y). 

Proof: Suppose that (R , R u R 2 ) E tft. Then, for arbitrary e > 0, we 
can (for sufficiently large block length n) reproduce {X k }, and {Y k } 
with arbitrarily small Ax, Ay, with capacity triple (in Fig. 2) 

(Co, d, C 2 ) = (Bo + e, Ri + «, #2 + e). . 

That is, with a code with M { = 2 nC «, i = 0, 1, 2. 

Since the total capacity of the three channels is C + Cx + C 2 , we 
must have 

Co + Ci + C 2 = 22 + 5x + R% + 3e ^ 77 (X, F). 
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Letting e — * 0, we have established (a). Inequality (6) follows in an 
identical way on observing that the common channel (with capacity 
Co) and the private channel to receiver 1 (with capacity Ci) together 
transmit {X*}. Inequality (c) follows, similarly. 

Let us remark that inequality (a) is an expression of the fact that 
a communication system with the constraints imposed in Fig. 2 
cannot perform better than in the "best of all possible worlds" situa- 
tion in which the receivers can collaborate. It is therefore called the 
"Pangloss bound." The set of triples (R , Ri, R2) that satisfy Hjj#« 
= H(X, Y) are called the "Pangloss plane." Corresponding to rate- 
triples that lie on the intersection of (ft and the Pangloss plane, the 
approximately H(X, Y) bits per second that characterize \X k , Y k ) 
can be split up into three parts (corresponding to the information 
transmitted over the three channels in our network) such that {Xk, Yk) 
can be essentially perfectly reconstructed by the three receivers in the 
network. In this situation, the information transmitted over the com- 
mon channel represents a kind of "core" process. Furthermore, the 
smallest R 0f such that (R , Ri, R2) G (ft and lies on the Pangloss plane 
(for some Ri, R2), can be thought of as a measure of the "common 
information" of {X k } and \Y k }. This point is explored thoroughly in 
Ref. 3. 

1.3.2 Some easily achievable rate-triples 

We now assert that certain rate-triples are achievable. 

Theorem 3 : The following triples belong to (ft : 

(A) R = H{X, Y), R x = R 2 = 

(B) R = 0, Rt = H(X), R 2 = H(Y) 

(C) R = H(Y), Rt = H(X\ Y), R 2 = 

(D) R = H(X), Ri = 0, R 2 = H(Y\X). 

Proof: To achieve (A), simply transmit {{Xk, Yk)) over the common 
channel (and do not use the private channels). To achieve (B), trans- 
mit {Xk) and {Yk) over the private channels to receivers 1 and 2, 
respectively (and do not use the common channel). To achieve (C), 
transmit {Y k ) over the common channel (requiring a capacity of 
about H(Y)), and deliver { Y k ) to receiver 1 to use as side information 
for transmitting { X k ) over the private channel to receiver 1. This will 
require a capacity of about H(X\Y). We do not use the private channel 

1688 THE BELL SYSTEM TECHNICAL JOURNAL, NOVEMBER 1974 



to receiver 2. Triple (D) can be achieved as in (C) by reversing to roles 
of X and F. 

Let us remark that points (C) and (D) lie on the Pangloss plane 
(i.e., they satisfy relation (a) of Theorem 2 with equality), since 
H(X) + H(Y\X) = H(Y) + H(X\Y) = H(X, Y). Furthermore, be- 
cause of the convexity of (R, all triples that are linear combinations of 
triples (A) to (D) are also members of (R. The situation is summarized 
in Fig. 3. The plane labeled "(a)" in the figure is the Pangloss plane 
denned by R Q + Ri + # 2 = H (X, Y). Theorem 2(a) states that the 
region (R (and therefore its lower boundary 01) lies above this plane. 
Similarly, Theorem 2(b, c) states that (R and (R lie above the planes 
labeled "(b)" and "(c)" in Fig. 3. 

Now the points labeled "A," "B," "C," and "D" in the figure are 
points (A) to (D) respectively in Theorem 3. As we mentioned pre- 
viously, points C and D (as well as A) he on plane a. Thus (from the 
convexity of (R), the triangle ADC lies in (R and must therefore be part 
of the lower boundary (R. Further, since points D and B lie on plane 6, 




Fig. 3 — Estimates of rate-region (R. 
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the line DB is part of (R. Similarly, line BC is part (R. Finally, since 
points B, C, and D are achievable, so are the points on the triangle 
BCD. Thus, the only unknown part of the lower boundary (R lies in the 
(upside-down) triangular pyramid with base BCD and apex at point 
E (the intersection of planes a, 6, c). The coordinates of point E are 
easily seen to be (R , R h R 2 ) = [7(Z; Y), H(X\ Y), H(Y\X)J 

Let us remark here that there is one special source distribution 
Q(x, y) for which point E is achievable. f In this case, the entire bound- 
ary region (R lies on planes a, b, c. This special case is when X, Y can 
be written X = (X f , V), Y = (Y',V), where X' and Y' are condi- 
tionally independent given V. Then I(X, Y) = H(V), H{X\Y) 
= H(X'\V), H(Y\X) = H(Y'\V), so that point E is R = H(V), 
Ri = H(X'\V), R 2 = H(Y'\V). Clearly, if, in the system of Fig. 2, 
we transmit V over the common channel and X' and Y' over the two 
private channels, we can reconstruct X = (X', V) at receiver 1 
and Y = (Y', V) at receiver 2. This requires a capacity triple 
C = H(V) + e, Ci = H{X'\V) + e, C 2 = H(Y'\V) + < (« > 
arbitrary), so that point is in fact achievable. 

We now give a characterization of the region (R (and therefore of (R) 
in terms of information theoretic quantities. This characterization is, 
in fact, the main result. 

1.4 Characterization of (R — the main result 

Suppose we are given Q(x,y), x £ EC, y £ y, an arbitrary prob- 
ability function, where 9C, <y are finite. Let (P be the family of prob- 
ability functions p(x, y, w), where x £ 9C, y £ *y, w £ W, and *W is 
another finite set, for which 

Z p(x,y,w) = Q(x,y),xEX,ye% (14) 

Each p £ (P defines discrete random variables X, Y, W in an obvious 
way. For each p £ (P, define the subset of Euclidean three-space 

(R<"> = {(Ro,Ri,R2):Ro^I(X, Y;W), Ri*H(X\W), 

R2^H(Y\W)}, (15a) 

and then let 

(R* = ( U (R (p) ) c , (15b) 

P£(P 



* M. Kaplan has shown that, in fact, this special case is the only one for which 
point E is achievable. 
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where ( ) c denotes set closure. Then our main result (the proof of 
which is given in Section III) is 

Theorem 4 : (R = (R*. 

Remarks : 

(1) Let us define (P T as the family of "test channel" transition proba- 
bilities. That is, (Pt is the family of all p t (w \ x, y) (x G 9C, y G % w G W), 
where *W is a finite set, and for each (x, y), p t (w\x, y) is a probability 
function on V9. Corresponding to each p t £ (P T , we have p(x, y, w) 
= Q(x, y)p t (w\x, y) G (P- Further, for each p G (P, we have p t (w\x, y) 
= [p( x > V> W )/Q( x > v)l £= &r- Thus (P is in 1-1 correspondence with <P T . 

(2) Since (R is convex, Theorem 4 implies that (R* is convex also. 

(3) Theorem 4 can be invoked to show that T(a) defined in (13) is 
also given by 

T(«) = inf [7(X, Y; W) + ai H(X\W) + a^(Y\W)J (16) 
pea* 

Thus, from Theorem 1, the lower boundary (R, and therefore (R, is 
essentially determined by T^a) given by (16). 

(4) Theorems 2 and 3 can be verified easily by using Theorem 4. 
Thus, if (R , R h R 2 ) G CR, from Theorem 4 for arbitrary e > we can 
find a triple of random variables X, Y, W such that 

R + Ri + R* ^ I(X, Y;W) + H(X\W) + H(Y\W) - e 

= H(X, Y) + IH(X\W) + H{Y\W) - H(X, Y\W)] - * 

^ H(X, Y) - 6 -> H(X, Y), ase^O. (17) 

This is Theorem 2(a). The second inequality in (17) follows from the 
fact that the entropy of a pair of random variables is less than the sum 
of the respective entropies. Part (b) of Theorem 2 follows from 

I(X, Y;W) + H{X\W) = I(X;W) + I(Y;W\X) 

+ H(X\W) ^ I(X;W) +H(X\W) = H(X). (18) 

The first equality in (18) follows from a standard identity [Ref. 4, 
Eq. (2.2.29)]. 

Theorem 3 follows from Theorem 4 on taking W as follows: (A) 
W = (X, Y), (B) W = 0, (C) W = Y, (D) W = X. 

(5) Although Theorem 4 characterizes (R and (R by an information 
theoretic minimization, it must be emphasized that the minimization 
is not, in general, easy. In fact, there is no nontrivial £ase for which 
we have succeeded in calculating the entire boundary (R analytically. 
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Its major utility at this point has been in finding upper bounds on (R 
by guessing at a p or p ( and calculating the corresponding triple 
[I{X,Y;W),H{X\W),H{Y\W)~\, which must lie_ above <R. See the 
example below. The problem of computation of (R both analytically 
and numerically is still open.* 

(6) For p E (P, we can define the quantities 

0«v(«O = Pr {X = x, Y = y\W = w\, x £ X, y (= <y, w EW, 

which can be thought of as the transition probabilities of the "backward 
test channel." For a given (x, y), we can think of /3x„ = fi x , v (W) as a 
random variable. Of course, fi xv must satisfy 

p xy ^ 0, (19a) 

E P* = 1, (19b) 

and 

E0, v = Q(x,y), (19c) 

where the expectation is taken over the distribution for W. Further, 

I(X, Y;W) = H(X, Y) - H{X, Y\W) = H(X, Y) 

-EZ^Jog^-, (20a) 

x,y Pxy 



H(X\W) =EZ ftS" log^ » H{Y\W) =EZ & log ^ , (20b) 



1 

/si" 1 r f " '"^ 

where 



flP = E 0„ = Pr (X = z| IF}, and /3< 2) = £ 0„ 

= Pr{F = y\W), (20c) 

and the expectation is taken over the distribution for W. Using this 
idea, it is possible to characterize, for example, T(a) as follows (see 
Ref. 3, for a precise proof of this characterization). Given Q(x, y), 
x G EC, y G % define (B as the family of collections of random vari- 
ables, {/3xj,}, x G SC, y G f y, which satisfy (19). Then 

T(a) = min[7(X, F; W) + ai H(X\W) -\-a 2 H(Y\W)^ 



* One reason for the difficulty is that I(X, Y; W) + ai H(X\W) + atH(Y\W) M 
apparently neither convex nor concave in p,. 
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where I(X, Y;W), H(X\W), H(Y\W) are given by (20) and the 
minimum (which can be shown to exist) is over all sets {/3 Ztf } in (B. 

This characterization may have value in the computation problem, 
since the quantities in (20) are linear functions of the joint distribution 
function for the {0 xy } and the constraints of (19) are also linear in- 
equalities in this distribution function. Thus, calculation of T(a) is a 
linear programming problem. 

(7) If p £ (P is such that X and Y are conditionally independent 
given W, then H{X, Y\W) = H(X\ W) + H(Y \W). Thus, with 
R = I(X,Y;W),R X = H(X\W),R 2 = H(Y\W), 

Ro + Ri + R, = H(X,Y) - H(X,Y\W) + H{X\W) + H{Y\W) 
= H(X, Y), 

and (R , R Xt R 2 ) £ (R and lies on the Pangloss plane. Reference 3 
shows that this class of triples (corresponding to a p £ (P, with X, Y 
conditionally independent given W) completely characterizes the inter- 
section of (R and the Pangloss plane. 

1.5 An example 

As an example of the preceding, let us consider the special case where 
the source is the "doubly symmetric binary source" (DSBS), where 
^ = *y = (0, 1}, and 

Q(x, y) = J(l - P°)&*.« + *Po(l - K.y), x, y = 0, 1, (21) 

and the parameter p satisfies ^ p ^ \. We can think of X as being 
an unbiased binary input into a binary symmetric channel (BSC) with 
crossover probability p , and Y as being the corresponding output, 
or vice versa. To get a clearer picture of the set of achievable rates (R, 
let us restrict ourselves to the plane in (R , Ri, /2 2 )-space, where 
R\ ~ Ri- The intersection of (R and this plane can be plotted in a two- 
dimensional picture. 

Let us first take a look at the implications of Theorems 2 and 3. In 
this source, 

H(X) = H{Y) = 1, H(X\ Y) = H(Y\X) = h(p ) 

and 

H(X, Y) = H(X) + H(Y\X) = 1 + h(p ), 

where 

h(\) = -X log X - (1 - X) log (1 - X), ^ X ^ 1 (22) 

is the entropy function. [We take h(0) = h(l) = 1.] With Ri = R 2l 
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Theorem 2 yields 



#0+2^! ^ 1 +h(p ), 
Ro + Ri^ 1. 



(23a) 
(23b) 



Thus, (R and therefore the lower boundary (R must lie above the lines 
labeled a and b in Fig. 4. 

Now Theorem 3 implies that points A[R = 14- MPo), Ri = 0]> 
and B(/2 = 0, Ri = 1) are achievable, so that any point on the line 
connecting them is also achievable. But we can do better. Let us drop for 
a moment the requirement that Ri = R 2 . From Theorem 3, C and D, the 
points [Bo = 1, Ri = h(p ), R 2 = 0] and [#0 = l,Ri = 0, R 2 = Mpo)] 
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are achievable. Thus, the point in (R , Ri, # 2 )-space halfway between 
them is also achievable. But this point, 

[#o - 1, Ri - hh(p ), R 2 = iMpo)], 

satisfies Ri = R 2 , and is therefore of interest to us now. Point F in 
Fig. 4 is therefore achievable, and therefore so are line segments AF 
and FB. But line segment AF coincides with line a, so that it must be 
on the boundary (R. So far, the unknown part of the boundary curve 
(R lies in triangle FHB. We can do better, however, by using Theorem 4. 

Theorem 4 asserts that any triple in (R (p) , p E (P, is achievable. We 
therefore guess at a p £ (P that defines random variables X, Y, W, 
and then assert that the triple R = I(X, Y; W), R x = H(X\W), 
R 2 = H(Y\W) is achievable. Since we choose apE(P such that 
Ri = R 2 , this triple is of interest in our present discussion. The p £ (P 
we have chos en is (wit h "W = {0, 1|) given by Table I. The quantity 
Pi = |(1 — Vl — 2 p ). One way of characterizing p is to think of W 
as an unbiased binary input and X, Y the respective outputs of two 
independent BSC's, each with crossover probability p\. Note that 
these two BSC's in cascade are equivalent to a single BSC with cross- 
over probability, 2pi(l — pi) = po- 

With X, Y, W so defined, X, Y are conditionally independent given 
W, so that (Ro, Ri, R*) lies on the Pangloss plane. [See remark (6) 
following Theorem 4.] We have 



Ro = I(X, Y; W) = H(X, Y) - H(X, Y\W) 

= 1 + h(p ) - 2h(p 1 ), 
Rl = Rz = H(X\W) = h(pi). 



(24) 



This is point G in Fig. 4. Line segment AG is therefore on the boundary 
(R. From these simple arguments, we see that the unknown part of 
the boundary 01 lies in the triangle GHB. 

To obtain a still tighter bound on (R, we employ the same technique 
as above — i.e., "guessing" at a p E <P and then deducing that (R (p) 
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Table II 
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C (R. Let /3 be a parameter for which 

Pl = Hi - Vl - 2p ) ^ ^ i 



(25) 



Then let W = {0, 1), and p(z, y, w) be given by Table II. Then the 
triples (Ro, Ri, R 2 )E. fft, where 

R = I(X, Y; W) = H(X, Y) - H(X, Y\W) 

= 1 + Mpo) + i( 1 _ /3 _| ? ) Iog i( 1 _,_|.) 






K'-*MM) (26a) 



and 



Rl = Ro = H(X\W) = H(Y\W) = h(0). 



(26b) 



For /3 = pi, the triple of (26) coincides with that of (25), i.e., point G 
in Fig. 4. For = J, the triple of (26) is R = 0, Ri = R 2 = 1, i.e., 
point B of Fig. 4. As /3 increases from pi to |, the family of rate-triples 
of (26) generate a curve c, which lies below the line GB and therefore 
constitutes a tighter upper bound on (R. We conclude that the unknown 
portion of (R lies in the shaded region in Fig. 4. 

In Section 2.5 we give some insight into how we "guessed" at these 
distributions pG(P. 

II. GENERALIZATION TO A FIDELITY CRITERION 

In this section we formulate a generalization of the problem of Sec- 
tion I in which we require that the source sequences \X k ) and {Y k } 
be reproduced to within a specified fidelity criterion and not, as in 
Section I, essentially perfectly. The proofs of the main theorems ap- 
pear in Section III. 

2.1 Definitions and formulation of the problem 

Let \(X k , Y k )\k=\ be a sequence of independent drawings of a pair 
of random variables IE 9C, FG % where the "source alphabets" 
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SC and *y are either discrete sets, the reals, or arbitrary measurable 
spaces. We assume that we are given a probability law that defines 
(X, Y). If SC and 'y are discrete, then we write 

Q(x,y) = Pr {X = x, Y = y\, x G SC, y G % 

If SC, <y are the reals, then (X, Y) may be defined by a probability 
density Q(x,y), — °° < x, y < ■» . For arbitrary measurable SC, *y, 
the pair (X, Y) is defined by a probability measure Q on SC X ( y. The 
marginal distribution for X, Y will be defined similarly by Qx, Qy 
respectively. 

As in (5), define the set I m = {0, 1, • • -, m — 1} for m = 1, 2, • • •. 
An encoder with parameters (n, M , M h M 2 ) is [as in (6)] a mapping 

Je : SC" X 'y - -> Zjf fl X I Ml X /m,. (27) 

We assume that the sequences {X t } and { Y k \ are to be reproduced as 
sequences of elements of sets SC and 'y, respectively, called "repro- 
ducing alphabets." Thus [as in (7)], corresponding to a given encoder, 
a decoder is a pair of mappings 

fiP : Im X I Ml -> SC", (28a) 

/L y) :/ A f X/ Wl -^'y". (28b) 

Let us adopt the convention of denoting n-vectors with bold-face 
type (either upper or lower case) and the components as the same sub- 
scripted letter in ordinary type. For example, u = (wi, • • •, u„). 

An encoder-decoder with parameters (n, M 0) Mi, M 2 ) is applied as 
follows. Say 

f E (X,Y) = (So, Si, Si), (29a) 

where X G SC", Y G < y n , and (S , Si, S 2 ) is a triplet of indices. Then set 

X = fn X) (S , .SO, Y = /IP (S , S 2 ), (29b) 

where XG9C", YG'y"- The encoder-decoder is said to have average 
distortion (Ax, Ay), where 

Ax = ED^X, X), Ay = ED 2 (Y, Y), (30a) 

and the single-letter distortion functions are defined by 

Di(x,x) = - E d^x^Xk), (30b) 

n fc = i 

D*(J,?) =\t d 2 (y k ,y k ), (30c) 

II' k =1 
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x G 9C n , i E 2C n , y G *y n , y G •y", and di{-, •) is a given nonnegative 
per-letter distortion function for the X-receiver and d 2 (-, •) is a given 
nonnegative per-letter distortion function for the Y-receiver. An 
encoder-decoder with parameters {n, M , Mi, M 2 ) with average dis- 
tortion (Ax, Ay) is said to be a code (n, M , Mi, M if Ax, Ay). 

A rate-triple (Ro, Ri, R2) is said to be (A x , A 2 )-acMeva&Ze if, for 
arbitrary e > and n sufficiently large, there exists a code 
(n, M , Mi, M 2 , Ax, A Y ) with 

M { ^ 2»<*<+ i >, i = 0, 1, 2, 
and 

Ay ^ Ai -f e, Ay ^ A 2 + «. 

The set of all (Ai, A 2 )-achievable rate-triples is called (R(Ai, A 2 ). Our 
main problem is to ascertain Gt(Ai, A 2 ), Ai, A 2 ^ 0. Clearly, this gen- 
eralized problem reduces to the problem of Section I, if 9C = 9C, 
V = *&> ^1 = d-i = dff, and Ai = A 2 = 0. As in Section I, the region 
(R(Ai, A 2 ) is completely defined by the boundary (R(Ai, A 2 ), where 
(R = (R(Ai, A 2 ) is defined in (11). Further, we show in the appendix 
that (R(Ai, A 2 ) is convex and that Theorem 1 holds with (R = (R(Ai, A 2 ). 

2.2 Rate-distortion functions and conditional rate-distortion functions 

A major tool in this study is rate-distortion theory. Specifically, 
joint, marginal, and conditional rate-distortion functions (or simply 
"rates") are used both in evaluations and bounds. These functions 
and their properties are dealt with in Refs. 1, 4, and 5. Here we only 
summarize some pertinent definitions and properties. 

The marginal, joint, and conditional rates are defined as follows. 
Consider first the case where the alphabets 9C, EC, % *y, are finite and 
Q( x > y), Qx(x), Qr(y) are probability functions. Then the (joint) rate- 
distortion function is defined by 

Rxy(Ai, A 2 ) = min I(XY;XY), (31) 

where the random variables XY are defined by a "test-channel" 
q t (A;y\x, y) — i.e., a probability function on £C X ^ for every (x, y) 
G SC X *y. The information in (31) is calculated for the joint distribu- 
tion 

Pr [X = x, Y = y, X = x, Y = £} = Q(x, y)q t (x, $\x, y). (32) 

The minimum in (31) is taken with respect to all test channels q t such 
that Edi(X,X) ^ Ai, Edz{Y, Y) ^ A 2 , where the expectations are 
taken with respect to the distribution (32). The minimum always 
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exists. Similarly, the marginal rates are denned by 

Rx(Ai) = min I(X;X), (33a) 

Ry(A 2 ) = min. I(Y; ?), (33b) 

q t (il\v):Ed(Y,Y)g&i 

where the expressions in (33) are interpreted analogously to that of 
(31). Detailed discussions of these quantities and their significance 
can be found in Refs. 1 and 4. 

Another quantity that plays a crucial role in our study is the "con- 
ditional rate distortion function." Let 9C, <y be finite, and let Q(x, y) 
be given. Let p(x, y, w) be a probability function on 9C X <y X W, 
where W is a finite set such that £„p(x, y, w) = Q(x,y). Then 
p{x,y,w) defines a triple of random variables X, Y, W, where the 
marginal distribution for X, Y is Q. The conditional rate-distortion 
functions are defined as 

Rxy\w(Ai, A 2 ) = min J(X, Y;£?\W), (34) 

where the minimum (which always exists) is taken with respect to all 
test channels g t (x, $ \ x, y, w) such that Ed x (X,X) ^ Ai, Ed,(Y, f) 
^ At. The conditional information in (34) is defined in Ref. 4, p. 21. 
The conditional rates R X \w{A\), flriir(Aa) are defined analogously. A 
detailed discussion of conditional rates is given in Ref. 5. Of course, 
these definitions are meaningful if X = W or Y = W. Roughly speak- 
ing, Rx,y\w(Ai, A 2 ) is the channel capacity required to transmit X, Y 
and to reproduce it as X, f to within an average distortion (Ai, A 2 ) 
when both the transmitter and receiver know W. 

We shall need several properties of the conditional rate-distortion 
function in the sequel. The first is given in Ref. 5. For A ^ 0, 

Rx\w(A) = min £ Pr \W = w}R X \w-u,(A w ), (35) 

to 

where Rx\w-*>(-) is the rate-distortion function calculated for a source 
with outputs x G SC with probability distribution P X \w(x\w) (the con- 
ditional probability function for X given W = w). The minimum is 
taken over all sets {A^ew such that £«,Pr {W = w\A w ^ A. 

A second fact of importance is that, say, R X \w{A) is a continuous, 
convex, nonincreasing function of A for A ^ 0. That R X \w(A) is non- 
increasing follows from the definition. The proof that it is convex 
parallels the proof of the convexity of the ordinary rate-distortion 
function. The continuity of R X \ w{A), A > follows from its convexity. 

SOURCE CODING 1699 



Finally, the continuity of R X \w(&) at A = follows from (35), and 
the continuity of Rx\w=w(&) at A = 0. 

A third fact we shall need is that, for any X, Wi, Wz, A ^ 0, 

Rxiw.wM) ^ flxiw,(A). (36) 

This follows from Rz\w t Wt(A) - inf I{X\£\W\Wi), where the 
infimum is with respect to test channels q t (x\x, Wi,Wt) such that 
Edi (X, J?) ^ A. Included in this class of test channels are those that 
are independent of w 2 , i.e., q t (x\x, w\ } w 2 ) = q t {x \xi, Wi). This subclass 
is exactly the class of test channels in the minimization for computing 

RxiwM)- 

The final property of conditional rates is stated as a lemma below. 
The proof is given in the appendix. 

Let X G 9C be a random variable with probability distribution 
Qx (x) = Pr (X = x } , where EC is a finite source alphabet. Let EC be a 
finite reproducing alphabet and let d(x, £) ^ 0, x £ EC, ^ £ EC be a 
distortion function. 

Now let {V? k }k=i be a family of disjoint finite sets and let 
\pk(x, w)\k=i be a family of probability distributions on EC X W* such 
that 

£ p k (x,w) = Qx(x). 

The random pairs (X, W k ) are defined by 

Pr \X = x, W k = w\ = p k (x, w), x E EC, w G W*. 

Let Rx\w k (A), A ^ be the corresponding conditional rate-distortion 
function. 

Next, set "W = £ J- 1^*1 where £ indicates union of disjoint sets. 
Define the probability distribution on SC X "W : 

p*(x, w) = - p k (x, w), for w E Wfc, 1 ^ k ^ n, 

it 

and let {X, W) be the corresponding random pair with conditional 
rate-distortion function Rx\ w(&), A ^ 0. Clearly, p*(-) is a mixture 
of the n disjoint probability distributions {p k }, with prior probability 
1/n. We now state the lemma. 

Lemma 5: For arbitrary { A A }"=i, A* ^ 0, 
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We note here that all the above is meaningful for the case where 
Q (x, y) is a probability density function or Q is an abstract probability 
measure. We need only make the obvious correspondences between 
discrete distributions and more general probability measures and 
replace "minimum" in (31), (33), (34), and (35) by "infimum." 

We conclude this section by taking a look at the specialization of the 
above to the case where di = d 2 = cIh, the Hamming distortion defined 
in (lb), and Ai = As = 0. Then 

R XY (0, 0) = H(X, Y), R x (0) = H(X), R Y (0) = H(Y), 
Rxy ]W (0,0) = H(X, Y\W), Rxiw(0) = H(X\W), 
Ry\w(0) = H(Y\W), 

where the entropy H(-) and the conditional entropy H ( • | - ) are defined 
in (2) and (3), respectively. Analogous to the relation 

H(X\ Y) + H(Y) = H(X, Y) g H(X) + H(Y), (37a) 

which holds for this special case, the following is established in Ref. 5 
for the general case : 

Rx\t(Ai) + flr(A 2 ) ^ Rxy(Ai, A 2 ) ^ R x (Ai) + fly(A 2 ). (37b) 

Further, it is shown in Ref. 5, Corollary 3.2, that the left inequality in 
(37b) holds with equality in some neighborhood of the origin { (Ai, A 2 ) : 
^ Ai, A 2 ^ 7}, provided that 

Q(x,y)>0, allxG 9C, yE % (38a) 

and di, d 2 satisfy 

di(x, x) > dtix, x) = 0, x ^ £, 

d*(y, 50 > d 2 (y, y) - 0, y * $. { } 

2.3 Characterization of (R(Ai, a ) — the main result 

We first state two simple theorems that are generalizations of 
Theorems 2 and 3. The proofs are analogous to the proofs of Section I, 
and are therefore omitted. Theorem 6(a) is also called the Pangloss 
bound. 

Theorem 6: If (R , R h R 2 ) E fl(Ai, A 2 ), then 

(a) R + flj + R 2 ^ ftxr(Ai, A 2 ). 

(b) R + fli ^ Rx(Ai). 

(c) R + R 2 ^ i2r(A 2 ). 
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Theorem 7: The following triples belong to <R(Ai, A 2 ) : 

(A) R = Rxy(Ai, A a ), Ri = R 2 = 0. 

(B) R = 0, Ri = flx(Ai), ii! 2 = fir (A,). 

It is also possible to generalize Theorem 3(C) and (D), but this must 
await presentation of the main result, which we now give. 

Consider first the case where EC, *y are finite. Let Q{x,y), x G 9C, 
V E *y be given. Now let (P be the family of probability functions 
p{x, y, w), where x G 9C, y G % w £ "W, *W is another finite set, and 

L pfo 2/, w) = Q(x, y), xex,yE % (39) 

Thus, (P is exactly as in Section 1.4. Now each p £ (P defines three 
discrete random variables X, Y, W in the obvious way. For p G <P 
and Ai, A 2 ^ 0, define the subset of Euclidean three-space 

(Ri*(Ai, A 2 ) = { (Bo, Ri, «t) : «o ^ /(*, F; TF), 

#2 ^ Rx\w{Ai), Ri ^ Ry\w(A 2 )\. (40a) 

Then let 

(R*(A h A 2 ) = [ U <R<*>(Ai, A 2 )>, (40b) 

where ( ) c denotes set closure. Since Rx\w(Ai) and Ry\w(A 2 ) are con- 
tinuous for Ai, A 2 =^ 0, we conclude that (R*(Ai, A 2 ) is continuous in 
(Ai, A 2 ) according to the Hausdorff set metric. This metric p(Si, S 2 ) 
between two subsets Si, S 2 of a Euclidean space is defined by 

p(Si, S 2 ) = sup inf || ri — r 2 || + sup inf \\n — r 2 ||, 

where || • || denotes Euclidean norm. 

If Q is either a density or a probability measure, then (R*(Ai, A 2 ) 
can be defined in an analogous way. In this more general case, we must 
require that the source has the property that there exists an A G 9C, 
$ G 'if such that 

Edi(X,£) < oo, Ed 2 (Y,$) < oo. (41) 

If 9C, 'y are finite, then (41) is always satisfied. We can now state our 
main result. 

Theorem 8: (R(Ai, A 2 ) = <R*(Ai, A 2 ). 

Remarks : 

(1) Theorem 8 reduces to Theorem 4 when 9C, <y are finite, di = d 2 
= dm and Ai = A 2 = 0. 
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(2) If we define (?t as in remark (1) following Theorem 4 as the set 
of test channels p t (w\x, y), then <P T is in 1-1 correspondence with (P. 

(3) Since (R(Ai, A 2 ) is convex, Theorem 8 implies that (R*(Ai, A 2 ) 
is convex also. 

(4) Since Theorem 1 is valid for (R(Ai, A 2 ), the present theorem 
implies that ^(a), denned in (13) is also given by 

T(«) = inf £I(X, F; W) + ot x Rx\w{^x) + «i«r|ir(A 2 )]. (42) 

p£(P 

Thus, from Theorem 1, the lower boundary <R(Ai, A 2 ), and therefore 
(R(Ai, A 2 ), is determined by T(u) given in (42). 

(5) As in remark (4) after Theorem 4, Theorems 6 and 7 can be 
obtained directly from Theorem 8. The steps parallel those in remark 
(4) and will be omitted. We will, however, give the generalization of 
Theorem 3(C) and (D). We state this as follows. The following triples 
(R ,R h Ri) G <R(Ai, A 2 ): 

(C) Bo = Br(Aa), #i = Rx\t (Ai), R 2 = 0, 

(D) R = /2z(Ai), Ri = 0, R* = Bri±(A,), 

where the random variable F is defined by the test channel that 
achieves the infimum in 22y(A 2 ) (assuming that the infimum can be 
achieved; if not, a simple modification is possible), and X is denned 
by the test channel that achieves ^(Ai). In the discrete case, we can 
achieve point (C) as follows. Let p*($\y) be the test channel that 
achieves I(Y; F) = J2y(A 2 ). Let W = % and let 

p(x,v,S) = Q(x,y)p*($\v) E<p. 

The random variables X, F, F are denned in an obvious way by 
v( x > V> $)■ Further, since X, Y are conditionally independent given F, 

I(X, F; F) = I(Y;Y) + Z(X; F| F) 
= 7(F; F) = fly(A 2 ). 

Also, the conditional rate flrif (A 2 ) = 0. Thus, from Theorem 8, with 
W = Y, we have (fi , fli, Ba) E (R(Ai, A 2 ) where 

/?o = /(X, F;TF) = #y(A 2 ) 
B, = Bxitt(Ai) = fixi*(Ai), 

/2 2 = i?y|B'(A 2 ) = 0. 

This is point (C). Point (D) is obtained on reversing the roles of X 
and F. 
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Since (R(Ai, A 2 ) is convex, any linear combination of points (A) and 
(B) of Theorem 7 and (C) and (D) above also belongs to (R(Ai, A 2 ). 
But there is no guarantee in this case that points (C) and (D) will lie 
on the Pangloss plane. There are cases for which a portion of the 
Pangloss plane is known to be realizable, as is shown in the example 
below. 

2.4 A technique for overboundlng (R(a„ a 9 ) 

In this section we present an intuitively sensible ad hoc scheme 
for choosing probability distributions p£(P that yield triples 
[I(X, Y; W), Rx\w{&\), jRy|w(A 2 )] which are often close to or actually 
on the boundary curve (R(Ai, A 2 ). In fact, in many cases this triple 
will lie on the Pangloss plane. 

A natural coding scheme to apply to our network would be to send 
a "coarse" version of the source output (X, Y) over the common 
channel, and then send to each receiver over its private channel only 
the necessary "fine tuning" it needs to meet its fidelity requirement. 
This reasoning leads us to the following family of rate triples that 
belong to (R(Ai, A 2 ). Assume for simplicity that 9C, t y, 9C, ^ are finite. 

Let Ai, A 2 ^ be given. Let /3i, /3 2 satisfy 

0i ^ Ai, 2 ^ A 2 . 

Now let q t (x, y\x,y) be the test channel that achieves I(X, Y; X, ?) 
= Rxy(Pi, pi). Then with W = (X, Y) we have that the triple 
(Ro, R h R 2 ) E (K(Ai, A 2 ), where 

R = I(X, Y;W) = R X y(0i,02), 
and 

Ri = KxifrCAi), R 2 = «y,iy(A 2 ). (43) 

Note that the rates corresponding to Theorem 7 (^4 ) and (B) and to 
points (C) and (D) in remark (5) following Theorem 8 can be gen- 
erated as special cases of the rate in (43). We do this as follows: 

A : Let (ft, /3 2 ) = (A,, A 2 ). 

B : Let fii, /3 2 be large enough so that Rxr(Pi, Pi) = 0. Then X, Y are 
degenerate. 

C : Let /3i be large enough so that R x (j8i) = 0, and let /3 2 = A 2 . Then 
X is degenerate. 

D: Let /3 2 be large enough so that Ry{&i) = 0, and let 0i = Ai. 

The power of this technique is illustrated by the following theorem, 
which asserts that under weak assumption the family of rates given 
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by (43) includes a substantial subfamily that lies on the Pangloss 
bound and therefore on the boundary. 

Theorem 9 : Given a source that satisfies 

(i) X = x, <y = % X, <y finite, 
(it) Q(x, y) >0,allxEX,ye % 

(Hi) di(x,x) > di(x,x) = 0, all distinct x, x E 9C, and d 2 (y, $) 
> di(y, y) = 0, all distinct y, y E % 

Then there exists two neighborhoods of the origin 

t,, = {(Ai, A 2 ):0 ^ Ai, A 2 ^ a] 
V2 = [OSi,]8 2 ):0 £j8i,0, ^ b}, 

w/iere < a ^ 6, sucA Ma<, if (Ai, A 2 ) E fi arcd (0i, 182) E *?2, Men 

So + Ri + #2 = ^xf(Ai, A 2 ), 

where (R , Ri, Rz) is given by (43). 

The theorem can be proved using Shannon lower-bound techniques 1 ,6 
and, in particular, the proof is similar to that of Theorem 32 in Ref. 5. 
Since the proof requires the generation of special machinery that is 
only tangential to the main ideas in this paper, we have elected to 
omit it. 

2.5 Examples 

(A) Our first example will be the DSBS considered in the example 
of Section 1.5. Here 9C = "U = (0, 1}, and 

Q(x, y) = i(l - p )8,.„ + |po(l - 8«.v)> *» V = 0, If t 44 ) 

where the parameter p E [0, §]. The distortion function will be the 
Hamming metric, i.e., di = d% = dn, where d H is denned in (lb). 
Again, as in Section 1.4, we consider only the plane in (R , Bi, R2)- 
space where Ri = R 2 and Ai = A 2 = A. We employ the technique of 
Section 2.4 to obtain an upper bound for (R(A, A). 
Making use of Ref. 1, pp. 46-50 (Ex. 2.7.2), we have 

P (R R\ [l+MPo)-2M0), 0^R^ Pl 

K X y{H, fi) - | L(1 _ pfl) _ UL{2fi _ ^ + L[2(1 __ fi) _ poll 

Vi^P^h (45a) 
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where 

Vi = h~ Hi - 2p , (45b) 

h{\) = X log X - (1 - X) log (1 - X), ^ X ^ 1, (45c) 

L(X) = -X log X, ^ X g 1. (45d) 

Now, from Ref. 1, the random variables X and V, which satisfy 
I(X, Y;XY) = R X y(P, /3) are such that 

Pr {X = x\X = x, ? = y) = Pr [X = x\X = x] 

- (1 - j8)fi,.i -f 0(1 - «,, a ), x,x,y = 0, 1 (46a) 
and 

Pr{F = 2/|X = x;? = y\ = Pr { F = y\ Y = y] 

= (1 - mv.-v + 0(1 - a,,*), </, y, x = 0, 1. (46b) 
Thus, again from Ref. 1 (p. 46, Ex. 2.7.1), f or ^ A g 0, 

Rx\xy(A) = Rx\x(A) = Ry\xy(&) = Ry\y(&) 

= M/3) - h(A). (47) 

Thus, we conclude that, for arbitrary S A ^ ^ £, the triple 
(So, fli, S 2 ) G (R(A, A), where So = Rxy(P, 0) [as in (45)], and Si 
= R 2 = h((3) - h(A). Let us note that, for ^ A g ^ p h these 
rate-triples (S , Si, S 2 ) satisfy 

So + Si + S 2 = 1 + h(p ) - 2h(A) = R X y(A, A), (48) 

and therefore lie on the Pangloss plane and (R(A, A). One special case 
occurs when A = 0, = pi. This yields the rate-triple of (24) — i.e., 
point G in Fig. 4. In fact, the distribution p(x, y, w) E <P, which we 
guessed at in Section 1.5, was obtained by setting W = (X, 7), where 
X, Y are as above for ^ p\. 

(B) Our second example is a source where Q (x, y) is a density func- 
tion and 9C, EC, % if are the reals. The ad hoc technique used in the 
previous example (A) will work here with obvious modifications. The 
random variables X, Y in this case will be jointly gaussian with EX 
= EY = 0, EX 2 = EY 2 = 1, and EXY - r, £ r S 1. Thus, the 
density 



Q(X ' V) = 2r(l - r*)* 6XP 



( x * + y i _ 2rx2/) 



2(1 - r 2 ) 

We take the distortion to be di(-, ■) = d 2 (-, ■), where 
di(x, x) = (x - x) 2 , —oo < x, x < oo . 
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(49) 



For < < oo, it can be shown 14 that 

M *"W^*ny)- l - r "" (50) 

0, ^ 1. 

Further, the random variables X, F which satisfy I(X, Y;X, 7) 
= Rxy(P, 0), < ^ 1 are such that, given X = 5, 7 = y, X and Y 
are gaussian with 

£(Z1Z = x, ? = y) = x, 
E(Y\X = x, ? = y) = y, 
var (X|* = x,? = y) = var (F|J? - *, f - ?) - 0- 

Thus, 1 - 4 for < A ^ < 1, 

1 a 

Rx\xy(A) = flriif(A) = g 10 ^^ ' 

Thus, we conclude that, for arbitrary < A £ £ 1, the triple 
(Ro,Ri,R2) 6<R(A, A), where R = Rxy(P, P) [as in (50)] and 
# x = ft 2 = i log 0/A. Again, observe that for ^ A ^ /3 ^ 1 - r, 

flo + Bi + «i = | log ( ~^ ) = flxr (A, A), (51) 

and therefore (R , Ri, R2) lies on the Pangloss plane and therefore on 
S(A, A). 

III. PROOF OF THE MAIN RESULT— THEOREM 8 

The proof of Theorem 8 consists of two parts: (i) the "converse" 
part, which asserts that any point in <R(Ai, A 2 ) belongs to <R*(Ai, A 2 ) 
and (ii) the "direct" or "positive" part, which asserts that any point 
in <R*(Ai, A 2 ) belongs to <R(Ai, A 2 ). We give the proof for the case where 
9C, <y are finite sets. The proof for arbitrary 9C, <y follows in a parallel 
way with integrals replacing sums, etc., in the standard way. We will 
begin with the converse. 

3.1 The converse 

Let (f g , fJP, f}P) define a code (n, M , M lt M 2 , A x , Ay). We find 
a p* (x ; y, w) £ (P for an appropriate set *W such that 

( - log Mo, - log Mi, - log M 2 ) E (R (p,) (Ax, Ay) C <R*(Ax, Ay). (52) 
y 71 Ti ft 1 
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The converse follows on applying the definition of (Ai, A 2 )-achievable 
rates and applying the continuity of (R* as discussed in Section II. 

First, let f E (X, Y) = (S , S h S 2 ) where &< £ l Mi is a random vari- 
able (i = 0, 1, 2). Then we have 

1 (1) 1 (2) 1 (3) 1 

-log Mo ^ -H(So) ^ -I(X,Y;S ) = ^H(X,Y) - H(X,Y\S )1 

(4) 1 n 

= - E LH(X k , Y k ) - H(X k} Yt\S , X h ■ ■ -, X k . h F„ ■ ■ ■, Yt-01- 

n k = \ 

(53) 
These steps are justified as follows: 

(1) From So G Im . 

(2) Standard inequality. 

(3) Definition of I(X, Y; S ). 

(4) H(X, Y) = ^2 k H(X k , Y k ) follows from the independence of the 
pairs (X k , Y k ), k = 1, 2, • • •, n. The rest is also a standard identity. 

Now, for 1 ^ k ^ n, let W k = (So, X h ■ • •, X k - l} Y h ■ ■ •, F*_,), a 
random variable belonging to, say, W*, a finite set. f Relation (53) is 
then 

ilogJfo^if /(X*, r*;TT t ). (54) 

' t lb h = 1 

Next, let X = f}Pof B (X, Y). Let A u - = tfdiC**, **), I £ k £ n. 
Then 

A x - #Di(X,X) = - E Ax*. (55a) 

We now write 

1 (i) 1 (2) 1 

-log M 1 ^-H(X\S ) ^/(X;X|S ) 

(3) 1 « 

= - E I {Xk ; X (So, X\, • ■ • , X k -i) 
n fc = i 

(4) 1 2 - 

= ~ E *■ (Xk ', Xk So, Xi, • • • , X k -i) 
n k =\ 

(5) In (6) 1 n 

fc - E Bx*if*(Au) £ - £ fixftiinCAu), (55b) 

where V k = (So, X h ■ ■ -, X*_i), and FT* = (F*, F,, • • •, F*_0 as 
above. These steps are justified as follows: 



1 We can, of course, take W* = /*„ X 9C* -1 X 'J/* -1 . 
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(1) Since X is a function of S and Si, we have that, conditioned on 
So = so, £ can take no more than Mi values (since Si £ ImJ- Thus, 
H(X\S = so) ^ log M i, all so. 

(2-4) Standard inequalities and identities. 

(5) From the definition of <Rx k \v k (Au), since Edi(X k , %k) = A». 

(6) Follows from (36). 

A similar derivation yields 

- log M 2 ^ - £ By 4 nr*(A 2 *), (56a) 

n n fc=i 



where 



Ay = - £ A 3fc . (56b) 

n k =i 



We are now in a position to define p*(x, y, iv). With W k defined as 
above, let 

p k (x, y, w) = Pr [X k = x, Y k = y, W k = w], 

x £ 9C, y £ % u> £ W*. 

Let 

Pi* (a, u>) = £ Pk(x,y,w), 

V 

Pzkiy, w) = £ p*(z, ?/,«;) 

be the marginal distributions for (X*-, W k ) and (F fc , TF*), respectively. 
The {W*} can be considered a class of disjoint sets. Let "W = 2W*, 
and define the probability function on 9C X % X W 

p*(z, ?/, u>) = - p k (x, y, w), w £ W*, 1 ^ Ag w. 

Since 

" 1 

£ p*(x l y,u')= L L ~Pk(x,y,w) = Q(x,y), 

we have p* £ (P. The random variables X, Y, W are defined by p* in 
the obvious way. We can think of W as being generated by choosing an 
integer K £ [1, »] without bias, and setting W = W k when K = k, 
1 ^ /c ^ n. A straightforward calculation yields 

/(X,F; IF) - - £ JCX", ^5 ^). (57a) 

n k=\ 
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Furthermore, Lemma 5 can be applied to pn, pn to yield 

Rxiw(Ax) ^lt Rx k \w k (A k ), (57b) 

•* A; =1 

Ryiw(Ay) ^lt Rv k \w h (A k )- (57c) 

»« A; =1 

Inequalities (57a, b, c) can be substituted into (54), (55), (56), re- 
spectively, to obtain 

± log Mo 2? /(X, Y;W) 

lb 

-logilfi ^ Rxiw(Ax) 

lb 

-logM 2 ^ /2r|w(Ay), 

lb 

which is (52). This completes the proof of the converse. 

3.2 The direct half 

We begin the proof by stating a lemma concerning conventional 
source coding for a single memoryless source. The source is denned by 
a random variable X G 9C, with probability distributions Qx(x), and 
a reproducing alphabet 9C with distortion function d\(x, x). As 
above, X = {X\, • • •, X n ) are n independent copies of X. Let QjP'QO 
= Rk=iQx(Xk) be the probability distribution for X. Let /2(A) be the 
rate-distortion function. 

A source code with parameters (n, M) may be thought of as a 
mapping /: £C n -» 6 £ 9C n , where card e ^ Af. Let X = (Jh, • ■ •, X n ) 
= /(X). Then Di(X, X) = l/n23"=.idi(X*, Xk) is a random variable. 
We are interested in the quantity 

r(A, + 5) = Pr {Z)i(X, X) ^ A + 5} = £ QJp>(x)*(x), (58) 

where $(x) = 1, if D[x, /(x)] ^ A + 8, and <t>(x) = 0, otherwise. We 
now state a lemma, which follows immediately from Lemma 9.3.1 and 
inequality (9.3.31) of Gallager. 4 

Lemma 10: Let A ^ 0, and e, 8 > be arbitrary. Then there exist A, 
B > such that for all n = 1,2, • • ■ there exists a code with parameters 
(n, M) satisfying 

and 

T(A + 8) = Pr {Di(X, X) ^ A + 5} ^ Ae~ Bn . 
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With the aid of this lemma the standard source coding theorem 
follows readily (Ref. 4, Theorem 9.3.1). 

Next, let us consider a compound source for which the source output 
in n time units is an n-vector X = (Xi, ■ ■ ■, X„) E 9C n - The \X k \ are 
still independent, but the X k are not identically distributed. 

Let rti, n 2 , •••, nj be such that Y/i=i n i = n > ana< let Qi(-), 
Qi(-), '••, Qj(') be J probability distribution functions on X. The 
source is characterized by the fact that a known subset n,- of the n 
coordinates of X are distributed according to Qj('), j = 1> • ", J> 
Let Rj(A) be the rate-distortion function corresponding to Qj(-) rela- 
tive to the distortion function d\. A code is denned exactly as above, 
and X = /(X). We now have 

Corollary 11: Let A, ^ 0, j = 1, 2, ■••,«/, and t, 8 > be arbitrary. 
Then there exist A if B } > 0,j - 1, • • •, J, such that, for all n = 1,2, • • • 

and any set {n,}' such that 2n> = n, there exists a code with parameter 
M satisfying 

M g ff exp 2 {n£Rj(Aj) + e]| = exp 2 {£ n£Bj(Ai) + «]} (59a) 
i-o 

and 

T(A + 5) = Pr {Dx(X, X) ^ A + 5} ^ *£ Aflr**, (59b) 

y-o 

where A = n _1 £ njAj. The (A j} Bj)'s are the (A, B) of Lemma 10 corre- 
sponding to Qj(-). 

The corollary follows immediately from Lemma 10 on noting that, 
for any random variables { Uj) and any set of constants {c,}, 

i i i 

Let us also remark that the Q {n) (x) used to compute T(A + 5) in the 
corollary is of the form 

<& n) (x) = II Qi(*fe) II &(*<*)•■ • ff Qj(x ijk ), (60) 

where the ijkth coordinate of x has distribution Qj(-), 1 ^ k ^ Uj, 

Let us now turn to our network coding problem. An alternative 
(though equivalent) way of denning a code for our network with 
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parameters (n, M , Mi, M 2 ) is 

(1) A mapping 

g: 9C" X «y B -> e, (61a) 

where 6 is an arbitrary set with cardinality ^M . The mapping g is 
called a "core code." 

(2) For each w (E C, a mapping 

0#° : 9C" X «y» -> e^ C SC«, (61b) 

where card e£° ^ -Mi. 

(3) For each w £ C, a mapping 

glP : 9C« X <U" -» 6^ C §», (61c) 

where card C^ 10 ^ -^2. 

The code defined in this way can be used on our network (Fig. 2) as 
follows. Let 6 = {w,}f°. Then, if g(x, y) = w„ the index i is trans- 
mitted over the common channel. Let C&f' = {xn\tL\, 1 ^ i ' ^ -M" . 
Then, if g(x,y) = w,-, and g£?(x,y) = i«, we transmit the index I 
over the private channel to receiver 1. The decoder at receiver 1, 
knowing the indices i and I, emits x«, and the resulting distortion is 
Di(x, £«)• Receiver 2 works analogously. 

Let us fix our attention on receiver 1, and assume that g^\x,y) 
= g£P(x). Then we define the quantity q(x, w,)(x G 9C", w, £ ©) as 
the probability that X = x £j 9C" and Y = y such that g(x, y) = w,. 
Thus, 

g(x,w,)= E <2 (n) (x,y). (62) 

y:e(x,y)— w,- 

Q (n) (x, y) = Ylk=\Q(xk, yk) is the probability distribution for X and Y. 
Then, as in (58) with X = g$>(X), W = g(X, Y), 

r(A, + 6) = Pr {ZMX, X) > Ai + 6} = E° L ?(*, w,)3>,(x), (63a) 

i-i i 



where 



--l^^5 w]>Al+i ' » 



Substituting (62) into (63), we obtain 



M 

r(Ai + 5) = E { E Q (n) (x, y)*<(x)}, (64a) 

i-l (»,t)Pfli 



1712 THE BELL SYSTEM TECHNICAL JOURNAL, NOVEMBER 1974 



where 

Gi = { (x, y) : g (x, y) = w<}, lgig M„. (64b) 

Now our goal is to show that there exists a code for n sufficiently large, 
with Mo, M h Mi appropriately chosen, and with r(Ai + £) arbitrarily 
small. 

Let us assume that we are given a probability distribution p (x, y, w) 
G <P, where x G 9C, y G % w G W. Let p w (w) = £,,„p(x, */, u>), 
w G "W be the marginal distribution of JF. Assume with no loss of 
generality that pw(w) > 0. Let 

be the "backward test channel." For x G 9C n , y G t y n , w G "W", let 

p (n) (x, y, w) = m =1 p(x k , y k , w k ) 

be the probability distribution for X, Y, W (n independent drawings 
of X, Y, W). Let p#>(w) = m=iPw(w k ), and pj n) (x,y|w) 
- U"=iPb(x k , y k \w k ). For (x, y, w) G 9C" X ■y" X *W", let 

z -(n)( X y w) = log Pr( *' y|W) = f log P6(X *' y * |Wt) (65) 

i (x,y,w; log Q( n) (x>y) 2-/og Q(a . fcjyjfc) . V»J 

be the information "density." Of course, 

tfi<»>(X,Y;W) = 7(X,Y;W} = n/{X, F;TF}. 
Finally, let Ai ^ be given and let {Au.ju.e'W satisfy 

Rx\\v{&\) = E Pw(iv)R X \w- w (&u,), (66a) 

and 

Ai = £ Pw(w)A w . (66b) 

See (35). A similar expression can be written for #y|w(A 2 ). 

We now return to our network coding problem. With p G (P given, 
we set out to construct a core code g with certain desirable properties. 
For any core code g: EC" X < y n -> C = jw,}f° C *W n , let N iv> = the 
number of occurrences of symbol w in code vector w„ 1 ^ i '^ M , 
w G "W. The existence of a desirable core code is assured by 

Lemma 12: Let p G (P awrf e > be arbitrary. Let I* = I(X,Y; W) 
correspond to p G (P. ^or ?( sufficiently large, there exists a code g as in 
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(61a) such that 

(i) M ^ 2»"'+«> 

12V • I 

(it) \— — — vw(w)\ ^ e[imn p w (w)~\, for all w E W, 
| n w 

(m)Pr(#)= E Q (n) (x,y)^e, 

5.= {(x,y):^Wx,y;<7(x,y)j^/*- e}, 

ft 

and i(x, y; w) is defined in (68). 

We defer discussion on the proof of Lemma 12 to the end of this 
section. 

Let us suppose that g is a code that satisfies conditions (i), (ii), 
and (Hi) of Lemma 12. Let {gilP}?' be a family of encoders as in 
(61b). Consider expression (64a). The term in braces is 

L Q<«>(x,y)*,(x) 

(ij)£Ci 

E Q<»>(x,y)*«(x) + £ Q (n) (x,y). (67) 

But if (x, y) E G, [i.e., #(x, y) = w t )] and (x, y) E S t , then 

Q<->(x,y) ^2-^-«)» P r ) (x,y|w i ), 

so that the first summation in the right member of (67) can be over- 
bounded : 

g 2 -c-«>» z pP(z,y|w<)*<(x) 

(x,y):o(x,y)=izj,- 
(x,y)G5, 

32-ti*-.). £ P r(x, y) | w,)**(x) (68) 

«.y 

^ 2 -('*-«)n £ pi»>(x|w j )*.-(x). 

z 

Combining (68), (67), and (64), we have 
r(Ai + 5) = Pr {Z>i(x, x) > Ax + 5} 

M Afo 

^ 2 _ (7 ._ ()n ^ ^ p (n) (x | w . )$ . (x) + E £ Q(., (X| y) 

i-1 i t=l (x,y)GGins; 

^ 2 _ (7 ._, )n L 2: P ^( X | W .)*.( X ) + p r (#). (69) 
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Now consider 

pi B) (x|w.) = II p b (x k | w ik ), 
k-i 

where w< = (wa, w i2 , ■ • -, Wi„), 1 ^ i ^ M" . With #,-„, as denned just 
above Lemma 12, we see that (for a given i) pi n) (x | w<) is the same form 
as Q in) (x) in (60) with n> = N iw . It then follows from Corollary 11 
that, with w, held fixed, we can find a source code for X — i.e., a 
mapping g£P — with parameter M = Mi such that, for arbitrary 
«, 6 > 0, 

Mi ^ exp 2 { L N iw [R X \w= w {^) + Of, (70a) 

and 

£ pj»>(x|w«)$<(x) ^ £ A„2- B *^«, (70b) 

where $,(x) is defined in (63b) with A x = n'^wiV^wAw, and {A w \ 
satisfying (66). The [(A V ,B W )} correspond to p b (x\w). Further, since 
the [Ni W \ satisfy condition (it) of Lemma 12, (70) becomes [using 
(66)] 

Mi ^ exp 2 {n £ (pw[_w] + e)(Rx\w-u[.AJ] + e)} 

g exp 2 {n(Bxiir[Ai] + e#[X] + e) } (71a) 
and 

Lpr(x|w,)4>,(x) ^ £ A w 2- i »-'"« r « w >< 1 — > £ C2-»*< l —\ (71b) 

where C = (card W)"-maxwil w and B = min w B w p w (w). Substituting 
(71b) into (69) and using conditions (i) and (Hi) of Lemma 12, we have 

r(Ai + 5) ^ 2-" (J *-« ) Mo-C2-" B »- ) + Pr (#) 

^ 2-"<b- b «- 2 «» + e — > 0, as n — » oo and then e — > 0. 

Since we can do an identical construction for F, we have proved 

Lemma 13: Let pE(P, and let the corresponding information be 
I(X,Y; W) = I*. Let Ai, A 2 ^ and e, 5 > be arbitrary. Then, for 
n sufficiently large, there exists a coding scheme as in (61) with param- 
eters (n, Mo, Mi, M-t) such that 

(i) M ^ 2«< r+ '\ 
(ii) Mi ^ 2"<**i" , < Al>+, \ 
(Hi) M 2 ^ 2»< R *i"l* )+t \ 
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and 

(iv) Pr {Di(X,X) > Ax + 5} ^ «, 
(») Pr{I> 2 (Y,Y) > A 2 + 5} ^ €. 

The following corollary follows from Lemma 13 in the usual way 
(exactly as does Theorem 9.3.1 in Gallager 4 ). 

Corollary 14- Let pG(P, and let the corresponding information 
I{X, Y ; W) = I*. Then, for arbitrary Ai, A 2 ^ 0, the rate-triple 
[I*, Rx\w(&i), Ry\w(^»)1 w (Ai, A 2 )-achievable. Thus, (R<*>(Ai, A 2 ) 
C (R(Ai, A 2 ), /or oK p£(P. 

The direct-half now follows on noting that, if S\ C £ 2 and $ 2 is 
closed, then the closure Si <= S 2 . Thus, (R*(Ai, A 2 ) ■= [U P (R (p) (Ai, A 2 )] e 
C (R(Ai, A 2 ), which is what we had to prove. 

It remains to prove Lemma 12. Since the proof is nearly identical 
to that of Lemma 9.3.1 in Gallager, 4 we will only outline the steps. 
Let e > be arbitrary. For w G V? n , let A^ u ,(w) = the number of 
occurrences of symbol w £ V? in the n-vector w. Then define 



T(e) = w£ W": all w £ W, 



iV»(w) , . 

— Pv,(w) 

Then, paralleling Gallager, there exists a mapping ,9 [as in (61a)] for 
which 

M g 2»<''+«> 



and 

Pr 



1 . 
n 



^[X,Y ;f7 (X,Y)]^7*- e or ?(X, Y) £ T(«)| 



g P,(4) + exp {-e»(-«*)} i f(n), 



where e 2 > is arbitrary and 



A = \(X, Y, W) : either - i^(X, Y; W) > /* + e 2 or 

I n 

i i<">(X, Y; W) ^ 7* - e, or W £ F(e) | , 

and Pt(') is probability computed with respect to p(x, y, w) £ (P. By the 
weak law of large numbers, if e 2 < e, then £„ — » 0, as n — > =0. 

Let the code whose existence we have just asserted be {w t }i fo . There 
must be at least one code vector, say, Wi, which belongs to T(e). Now 

Pr{g(X,Y)$T(e)\ £t(n). 
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If x, y are such that g(x, y) (£ T{e), change g(x, y) to Wi. The new code 
has g(x, y) E T(t) and 

Pr|^-^[X,Y; ff (X,Y)]^7*- ej ^ 2{(n) A 0. 

Thus, this new code satisfies conditions (z), (it), and (iti) of Lemma 12. 
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APPENDIX 

A.I Proof of the convexity ot (R(Ai, A 2 ) 

Let Ai, A 2 be given and held fixed. Write (R(Ai, A 2 ) as (R. 

Theorem 15: (R is convex. 

Proof: The theorem follows by a "time-sharing" argument. Let R (1) , 
R (2) G (R, and ^ 6 ^ 1. We must show that 

RG(R, (72a) 

R = 0R<»> + (1 - 0)R (2) . (72b) 

Let (qe, gff*, gW) and (lis, h$F, h$P) be codes with parameters 
(m, Mi", Mi 1 ), M?\ Ajp, AJP) and (n„ MJ 2 >, M{ 2 \ Mf , AF, Ay 2 '), re- 
spectively, where A#\ Aj 2 ' ^ Ai, Ay X) , Ay 2 ' ^ A 2 . Say 6 = A/B, where 
A, B are integers, ^ A ^ B ^ ». We show how to construct a code 
(n, M , Mi, M 2 , A x , Ay), where 

- log M, = ( - log Mi" ) + (l _ *)( 1 log MfA , (73) 

71 \ Til / \ 7l 2 / 

(t = 0, 1, 2), and A x ^ Ai, Ay ^ A 2 . This will establish (72) for 
rational 0. Since the region (R is closed, (72) must hold for all 6, estab- 
lishing Theorem 15. 

We now define a code with block length n = cri\ + dtiz, where 
c = An 2 , d = (B — A)n\. Let (x, y) £ EC" X ( y n be a sequence of n 
pairs. Partition this sequence into c blocks of ni pairs and d blocks of 
n 2 pairs. Encode-decode the first c blocks using encoder-decoder 
(9e, g$P, g^), and encode-decode the remaining d blocks using encoder- 
decoder (He, hjP, h$P). Denote this combination encoder-decoder 
by (fB,fP,fPY Consider f E (x,y)=(S Q ,S 1 ,S 2 ). The quantity 
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Si(i = 0, 1, 2) takes values in a set with 

(Ml»)<-(M}*>)* = Mi 

members. This set can, of course, be put in 1-1 correspondence with 
I Mi - Thus, for i - 0, 1, 2, 

- log Mi = - log M^ + - log Ml 2) = - log MP + (1 ~ g) log M?\ 

71 71 71 7l\ 7l 2 

which is (73). Further, the new code has Ax ^ Ai, and Ay ^ A 2 , so 
that the lemma follows. 



A.2 Proof of Theorem 1 

Again let Ai, Ai = be given and fixed, and write (R(Ai, A 2 ) = (R, 
and (R(Ai, A 2 ) = (R. 

We first establish part (u) of the theorem. Let R E S(a), a E Cfc'. 
If R £ (R, then there exists an R = (R 0} A, -$2) E &, such that 
Ri ^ /2„ i = 0, 1, 2, and at least one of these inequalities holds 
strictly. Thus, 

C(a, R) - C(a, R) = (*„ - JKo) + «i(A - fli) 

+ a 2 (A 2 - R 2 ) < 0. (74) 

The inequality follows from a h a 2 > 0. This contradicts R E S(a). 
Thus RG(R, which establishes part (it). 

It remains to establish part (i). We must first obtain some pre- 
liminary facts. 

Lemma 16: Let (R , Ri, R 2 ) E ft. Then 

(a) for a< = (i = 0, 1, 2), (R + a 0) Ri + a lf R 2 + a 2 ) E <R, 
(6) /or = = 1, [(1 - 0)fl o , i?x + 0fl„, R* + 0#o] E <R, 
(c) /or = Oi, 2 = 1, 

[fio + diRi + 2 # 2 , (1 - Bi)Ri, (1 - 2 )# 2 ] E (R. 
Proo/: 

(a) follows immediately from the definition of (R. 

(0) follows on noting that data sent through the common channel 
can be transmitted instead through each private channel. 

(c) follows on noting that any data transmitted through either 
private channel can be transmitted instead over the common channel. 
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Next, for R h Rz ^ write r = (R h R 2 ), and define the function 

F(t) = F(Ri, Ri) = min{#„: (B 0> Ri, Ri) 6 «}• (75) 

The minimum exists because (ft is closed. Clearly, (R , R\, Ri) G (R 
only if R = F(R lt fl 2 ). 

Lemma 17: F{r) is convex. 

Proof: Let r<» r< 2 > be arbitrary. Then |T(r< l >), r<»>], [F(r< 2 >), r«'] G (R. 
Since (R is convex, for ££ ^ 1, 

•CF(r»),r»>] + (1 - 0)[F(r< 2 >),r<»] 

= [^(r<») + (1 - Wr< 2 >), 0r (1) + (1 - 0)r< 2 >] E <R- 

Thus, by the definition of F(-), 

FOJ 1 ' + (1 - 0)r<*>] g 0F(r< l >) + (1 - 0)F(r< 2 >), • 

establishing the lemma. 

Now it follows from the convexity of F(-) that, for arbitrary 
r* = (R*, Rt), R\Rt ^ 0, there exists constants a< = a<(r*), i = 1, 2, 
such that, for all r, 

F(r) - F(r*) ^ £ «.(J2? - B<). (76) 

t-i 

This is a statement of the well-known fact that any convex curve lies 
above a plane of support. Here the curve is the locus of points in 
(R 0) Ri, # 2 )-space given by R = F(t) = F(R h R 2 ), and the plane is 
the locus of points R = F(i*) + E?=i«.(^< - #»)• Note that the curve 
and the plane coincide at r = r*. _ 

Now let R* = (Rl St, Rt) G (R. Then R* = F(R*, Rt). Let R = 
(R , R h R^ be any triple in (R. Then with r = (R h R 2 ), (76) yields 

F(r) + ai#i + a 2 # 2 ^ R*o + «i«t + <*2#2- 

Since, by definition of F(-), F(t) ^ R , we have: 

Rq -f- «i/ft + a 2 72 2 = min (R + aifti + a 2 .K 2 ) 

R£(R 

= min C(a, R) = r(o), 

R6(R 

where C(o, R) and T(a) are defined by (12) and (13), respectively. 
Thus, we have shown that, if the triple R* £ ft, then R* G S(a), 
where a need not necessarily belong to d. 
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Now suppose that R* = (Rl, R*, Rl) E <R, and R* E S(o) where, 
say, «i < 0. From Lemma 16(a) (with a > 0), R = (R , R* t + a, flj) 
E <R, and 

C(«, R) < C(«, R*), 

which implies R* (£ S(a), a contradiction. Thus, on (and similarly 
^2) ^ 0. Next, suppose R* E (R, and R* E S(o) where, say, en > 1. 
Then, from Lemma 16(c), R = (R* + #?, 0, #2) G (R, and 

C(«,R) <C(«,R*), 

again a contradiction. Thus, ai, and similarly a 2 ^ 1. Finally, suppose 
that R* E (R and R* E S(a), where on + a 2 < 1. By Lemma 16(&), 
R = (0, R* + ifo Rt + #S) G (R, and 

C(«,R)^C(«,R*), 

a contradiction. Thus, «i + a 2 ^ 1- We conclude that 

me u §(«), (77) 

aGQ 

where Cfc is the set of = (ai, a 2 ) that satisfy ^ ai, a 2 ^ 1, and 
«i + a> ^ 1. This is part (i). This completes the proof of Theorem 1. 

A.3 Proof of Lemma 5 

Let { A k } 1 be given, and, for A: = 1,2, ■ ■ ■ , n, let q tk (x \ x, w), x E SC, 
if E W a be a test channel for which 

L L rf(x ,f)g«(f I x, iv)p k (x, iv) ^ A fc , (78a) 

and 

/(X;*|JF*) ^ /e X |ir*(A*) + «, (78b) 

where e > is arbitrary. For w E W = £jt=i"W*, a; E 9C, £ G &, 
define the test channel 

o*(.x|x, u>) = ff«*(£|:r, w), for w E ^t, 1 ^ k ^ n. 
Then 

L rf(x, x)ge(f |x, ti))p*(x, l«) 
x,v,u> 

= L L £ -<*(*, A)q lk (A\x, iv)p k (x, w) g-jAj, 
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Thus, corresponding to the distribution p*(x, w) -q*(A\x, w), 

Z(X f *|JP)fcfix|ir(i£A»). (79) 

However, by a straightforward calculation, 

I(X,X\W) = - t I(X,X\W k ) ^-t Rx\w k (M) + e. (80) 

The inequality follows from (78b). Combining (79) and (80) and letting 
e — > 0, we have Lemma 5. 
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